Search CORE

104 research outputs found

GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection

Author: Gong Mackenzie
Liu Yan
Liu Yang
Peng Siyao
Yu Yue
Zeldes Amir
Zhu Yilun
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

In this paper we present GumDrop, Georgetown University's entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection. Our approach relies on model stacking, creating a heterogeneous ensemble of classifiers, which feed into a metalearner for each final task. The system encompasses three trainable component stacks: one for sentence splitting, one for discourse unit segmentation and one for connective detection. The flexibility of each ensemble allows the system to generalize well to datasets of different sizes and with varying levels of homogeneity.Comment: Proceedings of Discourse Relation Parsing and Treebanking (DISRPT2019

arXiv.org e-Print Archive

Crossref

Adpositional Supersenses for Mandarin Chinese

Author: Blodgett Austin
Liu Yang
Peng Siyao
Schneider Nathan
Zhao Yushi
Zhu Yilun
Publication venue: ScholarWorks@UMass Amherst
Publication date: 05/12/2018
Field of study

This study adapts Semantic Network of Adposition and Case Supersenses (SNACS) annotation to Mandarin Chinese and demonstrates that the same supersense categories are appropriate for Chinese adposition semantics. We annotated 20 chapters of The Little Prince, with high interannotator agreement. The parallel corpus substantiates the applicability of construal analysis in Chinese and gives insight into the differences in construals between adpositions in two languages. The corpus can further support automatic disambiguation of adpositions in Chinese, and the common inventory of supersenses between the two languages can potentially serve cross-linguistic tasks such as machine translation

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst

Mixture Proportion Estimation Beyond Irreducibility

Author: Fjeldsted Aaron
Holland Darren
Landon George
Lintereur Azaree
Scott Clayton
Zhu Yilun
Publication venue
Publication date: 01/06/2023
Field of study

The task of mixture proportion estimation (MPE) is to estimate the weight of a component distribution in a mixture, given observations from both the component and mixture. Previous work on MPE adopts the irreducibility assumption, which ensures identifiablity of the mixture proportion. In this paper, we propose a more general sufficient condition that accommodates several settings of interest where irreducibility does not hold. We further present a resampling-based meta-algorithm that takes any existing MPE algorithm designed to work under irreducibility and adapts it to work under our more general condition. Our approach empirically exhibits improved estimation performance relative to baseline methods and to a recently proposed regrouping-based algorithm

arXiv.org e-Print Archive

Recommended from our members

Overview of AMALGUM – Large Silver Quality Annotations across English Genres

Author: Behzad Shabnam
Gessler Luke D
Liu Yang
Peng Siyao
Zeldes Amir
Zhu Yilun
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2021
Field of study

Corpus resources for Linguistics and NLP research on discourse phenomena, such as coreference and discourse trees, are limited by a lack of large scale, well-understood, annotated datasets: corpora are either very large (100M-10G tokens) but shallowly annotated and with unknown composition, or richly annotated, but smaller. Here, we present a resource that takes a middle path, combining some of the best features of scraped corpora - size, open licenses, lexical diversity - and high quality curated data for more interpretable inferences with complex annotations

ScholarWorks@UMass Amherst

Recommended from our members

Reversible Interlayer Sliding and Conductivity Changes in Adaptive Tetrathiafulvalene-Based Covalent Organic Frameworks.

Author: Cai Songliang
Chatterjee Ruchira
Fan Jun
Garzón-Ruiz Andrés
Li Xinle
Liu Yi
Mao Haiyan
Navarro Amparo
Reimer Jeffrey A
Sun Bing
Yan Yilun
Yano Junko
Zhang Weiguang
Zheng Shengrun
Zhu Chenhui
Publication venue: eScholarship, University of California
Publication date: 01/04/2020
Field of study

Ordered interlayer stacking is intrinsic in two-dimensional covalent organic frameworks (2D COFs) and has strong implications on COF's optoelectronic properties. Reversible interlayer sliding, corresponding to shearing of 2D layers along their basal plane, is an appealing dynamic control of both structures and properties, yet it remains unexplored in the 2D COF field. Herein, we demonstrate that the reversible interlayer sliding can be realized in an imine-linked tetrathiafulvalene (TTF)-based COF TTF-DMTA. The solvent treatment induces crystalline phase changes between the proposed staircase-like sql net structure and a slightly slipped eclipsed sql net structure. The solvation-induced crystallinity changes correlate well with reversible spectroscopic and electrical conductivity changes as demonstrated in oriented COF thin films. In contrast, no reversible switching is observed in a related TTF-TA COF, which differs from TTF-DMTA in terms of the absence of methoxy groups on the phenylene linkers. This work represents the first 2D COF example of which eclipsed and staircase-like aggregated states are interchangeably accessed via interlayer sliding, an uncharted structural feature that may enable applications such as chemiresistive sensors

eScholarship - University of California

GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation

Author: Aoyama Tatsuya
Behzad Shabnam
Gessler Luke
Levine Lauren
Lin Jessica
Liu Yang Janet
Peng Siyao
Zeldes Amir
Zhu Yilun
Publication venue
Publication date: 02/06/2023
Field of study

We present GENTLE, a new mixed-genre English challenge corpus totaling 17K tokens and consisting of 8 unusual text types for out-of domain evaluation: dictionary entries, esports commentaries, legal documents, medical notes, poetry, mathematical proofs, syllabuses, and threat letters. GENTLE is manually annotated for a variety of popular NLP tasks, including syntactic dependency parsing, entity recognition, coreference resolution, and discourse parsing. We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks, which indicates GENTLE's utility as an evaluation dataset for NLP systems.Comment: Camera-ready for LAW-XVII collocated with ACL 202

arXiv.org e-Print Archive

Findings of the Shared Task on Multilingual Coreference Resolution

Author: Konopík Miloslav
Nedoluzhko Anna
Novák Michal
Ogrodniczuk Maciej
Popel Martin
Pražák Ondřej
Sido Jakub
Zeman Daniel
Zhu Yilun
Žabokrtský Zdeněk
Publication venue
Publication date: 16/09/2022
Field of study

This paper presents an overview of the shared task on multilingual coreference resolution associated with the CRAC 2022 workshop. Shared task participants were supposed to develop trainable systems capable of identifying mentions and clustering them according to identity coreference. The public edition of CorefUD 1.0, which contains 13 datasets for 10 languages, was used as the source of training and evaluation data. The CoNLL score used in previous coreference-oriented shared tasks was used as the main evaluation metric. There were 8 coreference prediction systems submitted by 5 participating teams; in addition, there was a competitive Transformer-based baseline system provided by the organizers at the beginning of the shared task. The winner system outperformed the baseline by 12 percentage points (in terms of the CoNLL scores averaged across all datasets for individual languages)

arXiv.org e-Print Archive